Bootstrapping Arabic-Italian SMT through comparable texts and pivot translation
نویسندگان
چکیده
This paper describes efforts towards the development of an Arabic to Italian SMT system for the news domain. Since only very little parallel data are available for this language pair, we investigated both the exploitation of comparable corpora and pivot translation. Experimental evaluation was conducted on a new benchmark developed by extending two Arabic-to-English NIST evaluation sets. Preliminary results show potentials of both approaches with respect to performance achieved by a popular state-of-the-art Web-based translation service.
منابع مشابه
Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora
Ebrahim Ansari ([email protected]) et al. 2017. Using english as pivot to extract persian-italian parallel sentences from non-parallel corpora. In " Applications of Comparable Corpora " edited book Berlin Linguistic Press (ed.). The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource l...
متن کاملLanguage Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation
An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features...
متن کاملAlignment Symmetrization Optimization Targeting Phrase Pivot Statistical Machine Translation
An important step in mainstream statistical machine translation (SMT) is combining bidirectional alignments into one alignment model. This process is called symmetrization. Most of the symmetrization heuristics and models are focused on direct translation (source-to-target). In this paper, we present symmetrization heuristic relaxation to improve the quality of phrasepivot SMT (source-[pivot]-t...
متن کاملSelective Combination of Pivot and Direct Statistical Machine Translation Models
In this paper, we propose a selective combination approach of pivot and direct statistical machine translation (SMT) models to improve translation quality. We work with Persian-Arabic SMT as a case study. We show positive results (from 0.4 to 3.1 BLEU on different direct training corpus sizes) in addition to a large reduction of pivot translation model size.
متن کاملMorphological Constraints for Phrase Pivot Statistical Machine Translation
The lack of parallel data for many language pairs is an important challenge to statistical machine translation (SMT). One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations especially when a poor morphology language is used as the pi...
متن کامل